Week 6.4 - Scientific Integrity and the Writing Pipeline

What We'll Cover

In Week 5, we examined how AI can hallucinate citations — generating references that look real but point to papers that do not exist. This session tackles the parallel problem on the writing side: how AI-assisted writing can compromise scientific integrity in ways that range from subtle to severe.

The core tension is straightforward. AI can genuinely help you communicate your research more clearly. It can also produce text that sounds authoritative while misrepresenting your data, overstating your claims, or presenting arguments you do not actually understand. The line between these outcomes is not always obvious, and it shifts depending on how you use the tool and how honest you are with yourself about the process.

We will map the integrity spectrum from low-risk to high-risk uses, examine what major publishers now require, confront the limitations of AI detection tools, and equip you with practical techniques for auditing your own AI-assisted work. The goal is not to scare you away from using AI for writing — it is to help you use it in ways you can defend with confidence.

📊 The Integrity Spectrum

Not all AI writing assistance carries the same ethical weight. The following spectrum moves from minimal integrity concern to serious risk of research misconduct. Where you draw your personal line matters — but understanding the full spectrum is essential.

Grammar and Spelling Fixes

Risk level: Minimal

Using AI to catch typos, fix subject-verb agreement, or correct punctuation is the digital equivalent of a spellchecker. The content, arguments, and meaning remain entirely yours. No intellectual contribution shifts to the AI. This is where most researchers feel comfortable, and most journal policies explicitly permit it.

The key principle: surface-level corrections that do not alter meaning carry negligible integrity risk.

Paraphrasing Your Own Text

Risk level: Low to moderate

Asking AI to rephrase a sentence you have already written for clarity or concision is generally low-risk — provided the meaning stays the same. The concern rises when AI paraphrasing subtly shifts your claims. A sentence that said "our results suggest X" might become "our results demonstrate X" — a small change with significant scientific implications. Always read the AI's version against your original meaning, not just for fluency.

Restructuring Arguments

Risk level: Medium

When AI reorganises the flow of your discussion section or reorders your argument, something important shifts: the structure of your reasoning is no longer entirely yours. Argument structure is not neutral — the order in which you present evidence shapes how readers interpret its significance. If AI restructures your argument and you accept the new order without deeply understanding why it is better, you have ceded part of your intellectual contribution.

The test: can you explain why this structure is better than your original, or did you just accept it because it reads more smoothly?

Generating Text from Your Notes

Risk level: High

This is where the boundary becomes genuinely blurred. You provide bullet points, outlines, or rough notes, and AI generates polished prose. The ideas may have started with you, but the expression, emphasis, and framing are the AI's. In academic writing, how you say something is often inseparable from what you mean. If you cannot rewrite the AI-generated paragraph from scratch — expressing the same ideas in your own words — then the AI's contribution to the intellectual work is more significant than it may appear.

Generating Arguments

Risk level: Very high

When AI generates the arguments themselves — not just expressing your ideas more clearly, but producing lines of reasoning you had not considered — the intellectual contribution has fundamentally shifted. These are no longer your scholarly contributions in any meaningful sense. Using AI-generated arguments without disclosure is analogous to using a ghostwriter: someone (or something) else did the intellectual heavy lifting, and you are presenting the result as your own thinking.

Generating Data Descriptions

Risk level: Variable — depends entirely on context

This is the most nuanced use case on the spectrum. It requires distinguishing two very different scenarios:

The good news: Modern AI models are better at this than many people assume. If you ask a well-designed model "I conducted a randomised controlled trial of a mindfulness intervention on undergraduate stress levels with 120 participants — describe the results," it will typically push back and ask for the actual data, the outcome measures, the effect sizes, and the statistical tests. It will not simply fabricate a results section. This is a significant improvement over earlier models.

The remaining risks are subtler: The outright fabrication scenario — AI inventing results from nothing — is becoming less likely with better models. But the more dangerous risks persist:

Instructed fabrication: If you prompt "Write a results section showing that mindfulness reduced stress," the AI will comply — it does what you ask. The fabrication is in the prompt, not the model.
Over-claiming: When given real data, AI may present a weak trend as a strong finding, or describe a non-significant result in language that implies significance.
Silent gap-filling: AI may have partial information and fill in gaps without flagging that it is doing so. This is harder to catch than outright invention.
Missing what matters: AI describes what is statistically obvious in the data, not what a domain expert would recognise as scientifically interesting or problematic.

With full data access (e.g. through Claude Code, a Jupyter notebook, or uploading files), AI can produce genuinely useful draft descriptions. The quality improves significantly with the amount of context you provide — experimental setup, research questions, field conventions, known confounders. But even with perfect context, your judgment as the domain expert is irreplaceable.

⚠️ When AI Knows More Than You

There is another dimension to consider: AI may genuinely know more than you about certain aspects of your subject. It has been trained on more papers than any human can read. This cuts both ways:

The positive side: AI can spot connections to literature you haven't encountered, suggest interpretations from adjacent fields, and fill gaps in your training. For early-career researchers, these can be genuinely valuable moments of learning.

The dangerous side: AI states everything with equal confidence, whether it's well-established consensus or something it's confabulating. If it "knows more than you," you are in the worst position to evaluate whether what it's telling you is correct. It can construct compelling narratives that sound authoritative but are subtly wrong — and if you lack the expertise to spot the error, you'll build on a flawed foundation.

The lesson: When AI teaches you something new, that is a starting point for verification, not a conclusion. The more you learn from AI, the more you need to verify independently — precisely because you're operating in the zone where you cannot easily distinguish correct from plausible. This connects directly to Week 4's virtue ethics: are you using AI to genuinely develop your understanding, or to create the appearance of understanding?

💡 The Key Principle

As you move down the spectrum, the AI's contribution shifts from form (how something is said) to substance (what is being said). Integrity concern scales with this shift. Improving form is assistance. Generating substance is co-authorship — or, without disclosure, misconduct.

A useful heuristic: if you removed the AI's contribution entirely, would the intellectual content of your paper change? If yes, the AI's role requires disclosure at minimum — and may require rethinking your approach entirely.

📜 Updated Journal Policies (2025–2026)

The publishing landscape has moved rapidly since early 2023. Major publishers have now established explicit policies on AI use, though they vary significantly in strictness. As a researcher, you are expected to know and comply with the policies of your target journal before submission.

Publisher	Policy	AI Authorship?	Disclosure Required?	Key Details
Science (AAAS)	Strictest (disclosure-based since Nov 2023)	No	Yes — full prompt disclosure	Originally banned AI-generated text entirely (Jan 2023). Revised November 2023 to allow AI use with mandatory disclosure: AI tool and version in cover letter and acknowledgments, full prompts in methods section. Remains among the strictest policies in academic publishing. Authors are fully accountable for accuracy.
Nature (Springer Nature)	Middle ground	No	Conditional	Prohibits AI authorship and AI-generated images. Allows AI for copy editing without disclosure. Requires disclosure for any substantive use in methods or acknowledgments. Distinguishes between surface-level and substantive assistance.
Elsevier	Disclosure-focused	No	Yes — upon submission	Updated September 2025. Requires a disclosure statement upon submission. Allows AI use with human oversight and responsibility. Authors must describe where and how AI tools were used. Full policy.
IEEE	Disclosure-focused	No	Yes	Requires disclosure of AI tool use. Prohibits listing AI as an author. Authors remain fully responsible for all content, including any AI-assisted portions. Applies across all IEEE publications.
ACM	Responsibility-centred	No	Yes	Requires disclosure. Authors must take full responsibility for all content, including AI-generated material. AI cannot fulfil the accountability requirements of authorship. Violation of disclosure constitutes a breach of ACM's publication ethics.
COPE Guidance	Baseline framework	No	Yes (recommended)	The Committee on Publication Ethics provides guidance that most publishers are converging toward: AI cannot be an author, transparency is required, and human responsibility for content is non-negotiable. Serves as the emerging baseline across the industry.

📋 The Converging Baseline

Despite differences in strictness, a clear consensus is emerging across publishers: (1) AI cannot be listed as an author because it cannot take responsibility for content. (2) Disclosure of AI use is required or strongly recommended. (3) Human authors bear full responsibility for everything in the manuscript, including any AI-assisted content. (4) Violations are treated as breaches of publication ethics. If you follow these four principles, you will be compliant with most current policies.

⚠️ Policies Are Moving Targets

The policies summarised above reflect the state of play in early 2026. They will continue to evolve. Before submitting to any journal, check its current AI policy directly. Many publishers update their guidelines without issuing prominent announcements. The cost of non-compliance — retraction, investigation, reputational damage — vastly exceeds the cost of checking.

🔍 AI Detection — The Unreliable Arms Race

If journal policies represent the rules, AI detection tools represent the enforcement mechanism. Unfortunately, that mechanism is deeply flawed.

How Detection Tools Work

AI detection tools analyse text for statistical patterns associated with machine-generated writing. The two most common metrics are:

Perplexity: How "surprising" the word choices are. AI tends to choose highly probable next words, producing text with low perplexity. Human writing is more variable and less predictable.
Burstiness: The variation in sentence complexity. Human writing naturally alternates between short, punchy sentences and longer, more complex ones. AI-generated text tends to be more uniform in its sentence structure.

The fundamental problem: modern LLMs have become sophisticated enough to mimic human variation in both perplexity and burstiness. As models improve, the statistical signatures that detection tools rely on become less distinctive. It is a classic arms race, and detection tools are losing ground.

📄 Key Reading on Detection

Pangram Labs (2025): "The State of Academic Integrity & AI Detection" — A comprehensive overview of the current state of AI detection in academic settings, including accuracy rates, failure modes, and the limitations of current approaches.

False Positives

Detection tools flag human-written text as AI-generated more often than most people realise. Formal academic writing, in particular, shares many statistical features with AI output: measured tone, structured argumentation, standard disciplinary phrasing. Non-native English speakers are disproportionately affected because their writing sometimes uses simpler, more predictable sentence structures — precisely the patterns detection tools associate with AI.

The consequence: students and researchers can be wrongly accused of AI use, with serious academic consequences, based on unreliable tools.

False Negatives

Conversely, AI-generated text that has been lightly edited by a human often evades detection entirely. Simple techniques — changing a few words per paragraph, varying sentence length, introducing deliberate imperfections — can push AI-generated text below detection thresholds. The more sophisticated the writer, the easier it is to defeat detection.

The consequence: detection tools create a false sense of security. Institutions that rely on them as their primary defence against misuse are building on an unstable foundation.

Bias Against Non-Native Speakers

Multiple studies have shown that AI detection tools have higher false positive rates for text written by non-native English speakers. This creates a structural equity problem: researchers from non-Anglophone backgrounds are more likely to be wrongly flagged, investigated, and penalised. In a global academic community, a detection system that disproportionately punishes people based on their first language is not a neutral tool — it is a biased one.

⚠️ The Real Risk Is Not Getting Caught

It is tempting to frame AI writing integrity as a detection problem: how do we catch people who use AI without disclosure? But this framing misses the deeper concern. The real risk is self-deception — submitting AI-polished work that you do not fully understand, defending claims you did not fully reason through, and presenting arguments you cannot independently reproduce. Detection tools cannot measure understanding. Only you can assess whether you truly own the intellectual content of your work. Integrity is not about evading detection. It is about honest engagement with your own research.

💡 Detection Is Not the Defence

AI detection should not be the primary mechanism for maintaining research integrity. Detection tools are too unreliable, too biased, and too easily circumvented to serve as a trustworthy enforcement mechanism. The real defence is cultural: building norms of honest disclosure, developing personal integrity practices, and creating academic environments where transparent AI use is supported rather than punished.

Integrity comes from the writer, not the detector.

🔎 Practical Auditing Techniques

Rather than relying on detection tools, develop personal auditing practices that ensure you genuinely own every part of your AI-assisted writing. These techniques work regardless of which AI tool you used or how sophisticated detection technology becomes.

The Read-Aloud Test

Method: Read your AI-assisted text out loud, slowly.

Does it sound like you? Would a colleague recognise your voice in this writing? AI-generated prose often has a distinctive quality — fluent but generic, correct but impersonal. If the text sounds like it could have been written by anyone in your field, it probably was not written by you in any meaningful sense. Your writing should carry your perspective, your emphasis, and your way of connecting ideas.

What to look for: Phrases you would never naturally use. Arguments structured in a way that does not match how you think. A level of polish that exceeds what you can produce independently.

The "Explain This Paragraph" Test

Method: Point to any paragraph in your text at random. Without looking at the text, explain the reasoning behind it.

This is the most powerful single test for AI-assisted writing integrity. If you wrote or deeply engaged with every paragraph, you should be able to explain the logic, the evidence, and the connection to surrounding paragraphs from memory. If you find yourself needing to re-read the paragraph to understand what it says, you did not write it in any intellectually honest sense — you accepted it.

Source Verification

Method: Check every factual claim against your actual sources.

This connects directly to the hallucination crisis we examined in Week 5. AI-assisted writing can introduce subtle factual errors: slightly wrong statistics, misattributed findings, overstated conclusions. Every claim in your text should trace back to a source you have actually read. If AI suggested a claim and you cannot find it in your sources, it may not be true — regardless of how plausible it sounds.

Internal Consistency Check

Method: Verify that all sections of your paper agree with each other.

When different sections are written or revised at different times — especially with AI assistance — internal inconsistencies can creep in. Does your methods section describe the same procedure your results section reports? Does your introduction promise what your conclusion delivers? Do the limitations you acknowledge actually correspond to the methods you used? AI does not maintain awareness across your entire manuscript. You must.

The "Defend It" Test

Method: Imagine a sceptical reviewer challenging every claim in your manuscript.

For each claim, can you explain the evidence, address obvious counterarguments, and defend your interpretation from your own understanding? This is the ultimate test because it mirrors what actually happens during peer review and thesis defence. If you used AI to generate an argument and cannot defend it under questioning, you will be exposed — not by a detection tool, but by the normal processes of academic scrutiny.

🔧 Making Auditing a Habit

These techniques are most effective when they become routine rather than afterthoughts. Consider building auditing into your writing workflow: after each AI-assisted revision round, run through these five checks before moving on. The time investment is modest compared to the cost of submitting work you cannot fully defend.

💾 Data Integrity and AI-Enabled Fabrication

Beyond writing, AI introduces new risks to data integrity that the research community is only beginning to reckon with. The tools that make research more efficient can also make fabrication more sophisticated and harder to detect.

📄 Key Reading on AI and Scientific Integrity

Frontiers in AI (2025): "AI for scientific integrity: detecting ethical breaches, errors, and misconduct" — An overview of how AI is being used both to commit and to detect scientific misconduct, covering synthetic data generation, image manipulation, statistical fabrication, and emerging detection methods.

The Fabrication Problem

AI-enabled fabrication is becoming more sophisticated across multiple dimensions of research output:

Synthetic data generation: AI can produce datasets that exhibit realistic statistical properties — appropriate distributions, plausible correlations, expected effect sizes — without any underlying experimental observation. These datasets can be extremely difficult to distinguish from real data through conventional review.
Plausible results narratives: When asked to describe experimental results, AI generates confident, well-structured descriptions of trends and findings. These descriptions sound authoritative precisely because they are based on patterns the model learned from thousands of real results sections — but they describe data that does not exist.
Image manipulation: AI tools can generate, modify, or enhance scientific images in ways that are increasingly difficult to detect visually. Western blots, microscopy images, and graphs can all be fabricated or manipulated with sophisticated AI tools.
Statistical fabrication: AI can generate statistical results — p-values, confidence intervals, effect sizes — that appear internally consistent and plausible, but correspond to no actual analysis.

Detection Tools for Data Integrity

A growing ecosystem of tools is emerging to combat AI-enabled fabrication:

Statcheck: Automatically checks whether reported statistics (test statistics, degrees of freedom, p-values) are internally consistent. Cannot detect fabrication directly, but catches many errors and inconsistencies.
Proofig / FigCheck: Image integrity tools that detect duplication, manipulation, and reuse across figures within and between manuscripts.
The Black Spatula Project: Community-driven effort focused on detecting mathematical errors and inconsistencies in published research, often revealing patterns suggestive of fabrication.

These tools are useful but imperfect. The most effective defence remains rigorous self-auditing and honest research practice.

The Broader Picture

AI-enabled fabrication is not just a technology problem — it is a symptom of systemic pressures. When academic careers depend on publication volume, when funding is tied to producing results, and when the costs of fabrication seem low relative to the rewards, the availability of sophisticated AI tools makes fabrication easier and more tempting.

Addressing this requires more than better detection. It requires examining the incentive structures that make fabrication attractive in the first place, and building research cultures where transparency and honest null results are valued.

🧪 Try It Yourself: The Fabrication Demonstration

To understand how easily AI can fabricate plausible-sounding results, try this exercise with any general-purpose chatbot:

Prompt: "I conducted a randomised controlled trial of a mindfulness intervention on undergraduate stress levels, with 120 participants. Describe the results."

Notice what happens: the AI will generate a complete results section with specific effect sizes, p-values, group comparisons, and statistical conclusions — all entirely fabricated. It will sound authoritative because it has learned the patterns of how results sections are written. But it has no access to any actual data. Every number it produces is a hallucination dressed up as a finding.

This is why generating data descriptions with AI is at the extreme end of the integrity spectrum. It is not assistance — it is fabrication.

📝 Disclosure Templates

Transparent disclosure is the practical foundation of AI writing integrity. The following templates provide starting points for different levels of AI use. Adapt them to match your actual usage and your target journal's specific requirements.

📄 Disclosure Guidance

AMEE Guide No.192 (2025): "When and how to disclose AI use in academic publishing" — A comprehensive guide to AI disclosure in academic contexts, including when disclosure is necessary, what to include, and how to frame it. Provides structured templates that many journals are beginning to reference.

✅ Template 1: Minimal Use (Grammar and Proofreading)

"[Tool name, e.g., ChatGPT-4o / Claude 3.5 Sonnet / Grammarly] was used for grammar checking and proofreading of this manuscript. All intellectual content, including research design, data analysis, interpretation of results, and argumentation, was produced entirely by the authors. No AI-generated text was incorporated into the substantive content of this work."

📐 Template 2: Moderate Use (Structuring, Feedback, Language Polishing)

"The authors used [tool name and version] during the preparation of this manuscript for the following purposes: [list specific uses, e.g., improving clarity of expression, suggesting structural revisions to the discussion section, language polishing for non-native English expression]. All AI-assisted text was reviewed, verified, and substantially revised by the authors. The authors take full responsibility for the content of this publication, including any portions that were refined with AI assistance. [If applicable: Prompts used are available in the supplementary materials.]"

📑 Template 3: Substantial Use (Drafting Assistance, Ideation)

"[Tool name and version] was used substantially during the preparation of this manuscript, including [list specific uses, e.g., generating initial draft text from author-provided outlines, brainstorming argument structures, synthesising literature themes]. All AI-generated content was critically reviewed, verified against primary sources, and revised by the authors. Sections that involved significant AI contribution include [specify sections]. The authors affirm that they understand and can defend all content in this manuscript, and take full responsibility for its accuracy and integrity. Complete prompts and AI interaction logs are available upon request / in the supplementary materials."

💡 Practice What We Preach

It is worth noting that these course materials themselves fall under Template 3. As stated at the top of every page: this content has been created and enhanced using Claude. The structure, research, and drafting involved substantial AI assistance, while the intellectual direction, pedagogical decisions, verification of all claims, and final editorial judgment remained with the course instructor. We believe in transparency about our own AI use — and we encourage you to adopt the same standard.

⚠️ Disclosure Is Not Absolution

A disclosure statement does not automatically make any level of AI use acceptable. Some journals (notably Science) prohibit AI-generated text regardless of disclosure. Others may accept disclosed use but still expect the intellectual contribution to be demonstrably the authors'. Disclosure is necessary but not sufficient — you must also ensure that your use of AI is consistent with your target journal's specific policy and with the norms of your discipline.

📚 Readings

Three core readings for this sub-lesson, all freely accessible.

📄 Core Reading 1

Elsevier (2025): Generative AI policies for journals — One of the most detailed publisher-level AI policies currently available. Read it as a model for understanding what publishers expect, and as a reference point for your own disclosure practices.

📄 Core Reading 2

AMEE Guide No.192 (2025): "When and how to disclose AI use in academic publishing" — Practical guidance on the mechanics of AI disclosure: when it is required, what to include, how to format it, and how to make disclosure meaningful rather than performative.

📄 Core Reading 3

Frontiers in AI (2025): "AI for scientific integrity: detecting ethical breaches, errors, and misconduct" — An overview of the dual role AI plays in scientific integrity — both as a source of new risks and as a tool for detecting misconduct. Useful for understanding the broader landscape of AI and research integrity.

📚 Summary & Key Takeaways

This session mapped the territory where AI writing assistance intersects with scientific integrity — from the low-risk terrain of grammar fixes to the high-risk zone of data fabrication.

The integrity spectrum is real: Not all AI writing assistance carries the same ethical weight. Understanding where your use falls on the spectrum — from surface-level corrections to substantive content generation — is the first step toward honest practice
Journal policies are converging: No major publisher accepts AI as an author. Most require disclosure. All hold human authors responsible for every word. Know your target journal's specific policy before you submit
Detection tools are unreliable: AI detection is too inaccurate, too biased, and too easily circumvented to serve as the primary defence against misuse. Do not build your integrity practice around avoiding detection
Personal auditing is essential: The read-aloud test, the explain-it test, source verification, internal consistency checks, and the defend-it test are your real safeguards. Build them into your workflow
Data integrity requires vigilance: AI can fabricate plausible-sounding data, results, and statistical analyses. Never use AI to describe data it has not seen
Disclosure is necessary but not sufficient: Transparent disclosure is the baseline, but it does not automatically make any level of AI use acceptable. Your use must also be consistent with your journal's policy and your discipline's norms

Next session: In Sub-Lesson 5 (Building Your AI Writing Workflow), we bring together everything from this week — the writing process, discipline-specific applications, collaborative workflows, and integrity practices — into a practical, personalised workflow that you can use for your own research writing.